This is meant to be a simple guide to help otherwise busy paleoecologists make use of some helpful tools for conducting research and publishing. All the while, you will also be making your work reproducible.
For this workshop we will be using two separate, but related pieces of open-source software: R and R Studio.
R is a standalone open-source software for statistical analysis. It has an interesting history (you can find more here), but it is both a programming language and an environment within which you can use the language to execute commands. Thus, it is described as the “R Statistical Computing Environment”. R is an instantiation of a tiny universe with many rules, few (or none!) of which you know.
R Studio is a wrapper for R that allows us to see a bit more of what is going on inside of R and to control it through the window rather than exclusively through the “console”. R Studio knows the rules so you don’t have to. Much like a spell-checker in word processing, R Studio checks your code for you. More on this later.
First, you need to install R from cran.r-project.org. Click this link or the logo below, at the top of the page you will find a section titled “Download and Install R” and choose the appropriate download for your operating system (OS).
It should look something like this:
Figure 1. cran.R-project website landing page.
Once you’ve selected the R version for your OS, you’ll be given some download options. Select “base” and download the .exe file (Windows).
Figure 2. cran.R-project windows download options, select ‘base’.
Double-click on this file and follow the instructions from your machine’s prompts for installation. This differs slightly between each operating system and the version of the operating system you’re using.
If you’re using macOS, you will be directed to a slightly different looking page with multiple download options. Here, you much choose a package based on the version of the macOS you’re using. If you’re using OS 11 or greater (Apple names their updates, so this one is ‘Big Sur’) select the top option. If you’re using versions before this (‘High Sierra’), then download the second option.
Figure 3. cran.R-project macOS download options
For windows users, R will install to your “program files” folder. For macOS users, you will need to move the R.app folder from the package (once opened) and drag it into your applications folder.
Second, let’s open up R to make sure that it works and to look at some important features that will help you later on. When you open R, you only get a text window called the console. This is where all of the action happens and as soon as you open R, you’re given some basic, but important information.
Figure 4. R Console with annotations for working directory, version, and citation instructions.
Note the following: -the working directory is the folder where R looks when it goes to find or write things. -the software version is at the top of the initialization text, this information is important for citations. -the intialization text includes instructions for getting information about licensing, help, and for citing R. -the command line is the line starting with “>”, which is where you politely ask R to do things for you.
We can actually try out a bit of coding at this point. Type the code below in your R console and hit “enter”. You can also copy-paste from this document into the console as well.
citation()
This can also be written as:
citation("base")
Both these commands give us the same result.
##
## To cite R in publications use:
##
## R Core Team (2020). R: A language and environment for statistical
## computing. R Foundation for Statistical Computing, Vienna, Austria.
## URL https://www.R-project.org/.
##
## A BibTeX entry for LaTeX users is
##
## @Manual{,
## title = {R: A Language and Environment for Statistical Computing},
## author = {{R Core Team}},
## organization = {R Foundation for Statistical Computing},
## address = {Vienna, Austria},
## year = {2020},
## url = {https://www.R-project.org/},
## }
##
## We have invested a lot of time and effort in creating R, please cite it
## when using it for data analysis. See also 'citation("pkgname")' for
## citing R packages.
Congratulations! If this is your first time using R, then you just ran your first function!
Note that “base” gives us the same output because the R Statistical Computing Environment comes with a lot of basic functions (hence “base”) that it can do. Later, we will explore how we can expand R with “packages” written and maintained by other users.
We have accumulated some vocabulary at this point and it is helpful to explain these terms and how they relate to each other here. R will do exactly what you tell it to, so it helps to know how it thinks. The R environment runs almost entirely by creating objects and applying functions to them. Like we experienced above, functions make things happen. In order for R to do things with an object, it has to know that it exists.
We could use “base” above because the citation() is already a part of R and knows where to find it (its always an object). Let’s learn how to create an object.
All coding languages run on syntax (rules for combining things for communication). Here’s some key symbols in R syntax.
| Syntax | Action |
|---|---|
| = | equals sign is used to assign data to objects |
| <- | arrow-dash is the same as equals sign, assigns data to objects |
| # | hashes designate non-coding regions, used to annotate code |
You can copy-paste the entire section of code below and run it. Another nice thing about R is that you can submit a whole list of commands at once, as long as each of these commands and function are entered correctly. Because my annotations are preceded by a hash “#”, they’re not read as commands.
# Here, we use "=" to create an object called "x" and assign it the value of 5.
x = 5
# We can also use "<-" to create another object called "y" and assign it the value of 6.
y <- 6
Once you run the above code, you may notice that basically nothing happened. R happily ran your commands and creates an object named “x” with a value of 5 and an object “y” with a value of 6. You didn’t tell R to give you any output, so none is given. Type “x” in the command line and then hit “enter”. Do the same for “y”. R should return the values each time after you hit “enter”. This is rudimentary, but you are coding now. Also, now that we’ve experienced what R is like, we can gain a better appreciation for what R Studio does for us.
Third, you will need to download R Studio and install it. R Studio is available as a “free version” and a “professional” version. The links here go directly to the free version.
The webpage should detect what machine you’re using and suggest the correct version of R Studio. If it does not, there are other download options below which correspond to various OS and OS versions. You will notice that the webpage also tells you to install R (which we’ve already done), so you can skip to “2” and download R Studio.
Figure 5. R Studio Download Page.
You can also embed plots, for example:
Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.